Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Runtime APIs for node-side code #1401

Merged
merged 12 commits into from
Jul 17, 2020
Merged

Runtime APIs for node-side code #1401

merged 12 commits into from
Jul 17, 2020

Conversation

rphmeier
Copy link
Contributor

No description provided.

@rphmeier rphmeier added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Jul 13, 2020
@github-actions github-actions bot added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Jul 13, 2020
@rphmeier rphmeier added A3-in_progress Pull request is in progress. No review needed at this stage. A0-please_review Pull request needs code review. and removed A0-please_review Pull request needs code review. A3-in_progress Pull request is in progress. No review needed at this stage. labels Jul 14, 2020
@rphmeier
Copy link
Contributor Author

As an integration plan, I'd suggest adding these runtime APIs alongside the existing ones, postfixing all the old ones with _Old as a first PR, and then having follow-up code that removes dependence on these deprecated APIs. Then have follow-on PRs for different subsystems to migrate them to the new set of APIs.

@rphmeier rphmeier marked this pull request as draft July 14, 2020 15:39
@rphmeier rphmeier marked this pull request as ready for review July 14, 2020 22:57
@rphmeier rphmeier mentioned this pull request Jul 15, 2020
Yields the validator-set at the state of a given block. This validator set is always the one responsible for backing parachains in the child of the provided block.

```rust
fn validators() -> Vec<ValidatorId>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn validators() -> Vec<ValidatorId>;
fn validators(parent_hash) -> Option<Vec<ValidatorId>>;

Without the parent hash, we're not providing a context from which to get a validator set; the runtime can't know whether we're requesting the current set, the set for the next block, or a historical set.

The Option is meant to handle the case that a validator set is unavailable: an invalid or unknown parent_hash, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, by that logic we should make all of these Optional. I would rather bake in an assumption that the parent hash is valid as that's closer to what the underlying code provides. Invalid parent-hashes just return runtime API errors.

impl GroupRotationInfo {
/// Returns the index of the group needed to validate the core at the given index,
/// assuming the given amount of cores/groups.
fn group_for_core(core_index: usize, cores: usize) -> usize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn group_for_core(core_index: usize, cores: usize) -> usize;
fn group_for_core(core_index: ValidatorIndex, core: usize) -> usize;

If we have a ValidatorIndex newtype, we may as well use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But these aren't validator indices, they're core indices. We do have a CoreIndex type, although I would like to keep it private to the scheduler module. It drifted in #1312 ; after #1411 we can move them back a bit.

/// If this core is freed by being timed-out, this is the assignment that is next up on this
/// core. None if there is nothing queued for this core or there is no possibility of timing
/// out.
next_up_on_time_out: Option<ScheduledCore>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having two next_up_on fields feels like it introduces the potential for confusion. When would we want next_up_on_time_out to differ from next_up_on_available? If they do differ, is it possible for a core both to timeout and to become available simultaneously? If so, what's the actual next scheduled core?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the runtime has all the answers".

We don't want them to differ, and yes, it is confusing, but this is purely exposing information about the system described in the scheduler module. It would be nice to have a clear "next up" in all cases, but deeper reading on the scheduler module will reveal why that is not possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible for a core both to timeout and to become available simultaneously

No, they are mutually exclusive. These are the two different paths an Occupied core can take to the Free state. However, depending on which path is taken (which is unpredictable as it's based on the view/honesty of the block producer), the scheduling metadata can be different, leading to a different assignment onto the now-Free core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we should always optimize for the next_up_on_available case in Node-side code. The reason being that timeouts are only possible at a small subset of blocks, as they are only triggered within the short span of time directly following a group rotation. And even when timeouts can be triggered, unless validators are offline, they will not be reached before availability. And if validators are offline, it's fine to degrade throughput of paras somewhat.

However this runtime API is in the interest of making all information available to the node so more advanced strategies can be taken as research evolves.

roadmap/implementers-guide/src/runtime-api/README.md Outdated Show resolved Hide resolved
@@ -7,6 +7,14 @@ In a way, this entire guide is about these candidates: how they are scheduled, c

This section will describe the base candidate type, its components, and variants that contain extra data.

## Para Id

A unique 32-bit identifier referring to a specific para (chain or thread). The relay-chain runtime guarantees that `ParaId`s are unique for the duration of any session, but recycling and reuse over a longer period of time is permitted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter for the purpose of this PR, but I'm curious: are the wrapped u32s hashes of some kind, or handed out sequentially, or what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't described the registrar in the guide yet, but I think it gives them out sequentially. Parachain IDs start at 0 and parathread IDs start with some of the higher bits set, although I don't remember exactly what.

Reuse is fine as long as it's at least a few sessions apart, although it would be best not to reuse until having cycled completely. We could alternatively use a generation/index system like this for handing out IDs to avoid reuse over a long period of time: http://bitsquid.blogspot.com/2014/08/building-data-oriented-entity-system.html

The uniqueness property is what's important here and the details of the registrar are free to change

rphmeier and others added 2 commits July 15, 2020 13:47
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I have a question about error handling though. None of the proposed runtime APIs return a Result. Question: at which layer a storage access error is handled and how or is it always infallible?

/// Returns the validator groups and rotation info localized based on the block whose state
/// this is invoked on. Note that `now` in the `GroupRotationInfo` should be the successor of
/// the number of the block.
fn validator_groups(at: Block) -> (Vec<Vec<ValidatorIndex>>, GroupRotationInfo);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have a type alias for a validator group as well

type ValidatorGroup = Vec<ValidatorIndex>;

/// Fetch the value of the runtime API at the block.
///
/// Definitionally, the `at` parameter cannot be any block that is not in the chain.
/// Thus the return value is unconditional. However, for in-practice implementations
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ordian Does this address your Result question?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partially, I should have been more explicit in my question, this was more substrate related question than to this PR, namely how low-level db errors are handled. And from what I see, https://github.com/paritytech/substrate/pull/3997/files, either the errors will be swallowed or a panic will occur. Anyway, this is handled on a different level.

@rphmeier
Copy link
Contributor Author

Will merge this as it is directionally correct and address follow-ons in #1411

@rphmeier rphmeier merged commit d656215 into master Jul 17, 2020
@rphmeier rphmeier deleted the rh-guide-runtime-apis branch July 17, 2020 00:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants